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ABSTRACT 



Context. It is crucial to develop a method for classifying objects detected in deep surveys at infrared wavelengths. We specifically 

need a method to separate galaxies from stars using only the infrared information to study the properties of galaxies, e.g., to estimate 

the angular correlation function, without introducing any additional bias. 

Aims. We aim to separate stars and galaxies in the data from the AKARI North Ecliptic Pole (NEP) Deep survey collected in nine 

AKARI/ IRC bands from 2 to 24 yum that cover the near- and mid-infrared wavelengths (hereafter NIR and MIR). We plan to estimate 

the correlation function for NIR and MIR galaxies from a sample selected according to our criteria in future research. 

Methods. We used support vector machines (SVM) to study the distribution of stars and galaxies in the AKARIs multicolor space. 

We defined the training samples of these objects by calculating their infrared stellarity parameter {sgc). We created the most efficient 

classifier and then tested it on the whole sample. We confirmed the developed separation with auxiliary optical data obtained by the 

Subaru telescope and by creating Euclidean normalized number count plots. 

Results. We obtain a 90% accuracy in pinpointing galaxies and 98% accuracy for stars in infrared multicolor space with the infrared 

SVM classifier The source counts and comparison with the optical data (with a consistency of 65% for selecting stars and 96% for 

galaxies) confirm that our star/galaxy separation methods are reliable. 

Conclusions. The infrared classifier derived with the SVM method based on infrared .^gc- selected training samples proves to be 

very efficient and accurate in selecting stars and galaxies in deep surveys at infrared wavelengths carried out without any previous 

target object selection. 

Key words, infrared: galaxies - infrared: stars - galaxies: fundamental parameters - galaxies: statistics 



1. Introduction 

The first proof that various types of extragalactic sources evolved 
with cosmic epochs was delivered in 1950s by surveys of ex- 
tragalactic radio sources and quasars, which revealed an excess 
of faint sources when compared with uniform distribution mod - 
els (e.g. .Rvle & Scheuer..l955:.McVittie & Schustermanlll966h . 
Interest in studying the deep Universe became much greater after 
the discovery of the excess of faint blue galaxies in optical pass- 
bands with photographic plates (e.g. Kron 1978; Williams et al. 
Il996t lEllislll99^ . This revelation was followed by discoveries 
of excess numbers of faint sources at early cosmic epochs in 
all wavelengths: in X-rays by the ROSAT X-ray Observatory 
(e.g. lHasinger "l992V Chandra (e.g.'Hasingei"2002'), and XMM- 
Newton ( e.g. iSa sseen et al. 2002); in mid- and far- infra red by 



ISO (e.g.lOhveiiTl 996; Pug et et alJ [T999; Taniguchi et al 
iTakeuchietaimOOD . IRAS (e.g. iLonsdale & Hackim ' 



1999; 



19891: 



Bertin et al. 1997), and later by the Spitzer Space Tel escope (e.g. 
Papovich et aL.2004: .Dole et alj|2004t iFraver et al.ll2"006l) . and, 
when observational t echniques became available, in the sub- 
millimeter range (e.g. Barger et al. '200 li IClements et al.ll2010l : 
Oliver etal. 2010; Valiante et al. 201(f~This was the first step 
toward modern studies of evolutionary processes. 

The theoretical motivation for studying cosmic evolution 
arose from observations of the local Universe, which show a 
very diverse galaxy distri bution, where as the early Universe 
was almost uniform (e.g. |Peeblesl[l97ll) . Those complex pat- 
terns are the result of tiny density fluctuations that interacted 
and increased gravitationally as the Universe expanded. Galaxy 
distribution can be studied in various statistical ways. The re- 
cent cosmological probes provide more and more proof that the 
large-scale structure of the Universe was created according to 
the hierarchical formation scenario. This describes the formation 
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and evolution of galaxies inside halos of dark matter, which in- 
teracted gravitationally, resulting in their growth through merg- 
ers (White & Rees, 1978). Today the cluste ring process of dark 
matter halos is adequately understood (e.g. iMo & Whitelll996i 
iGao et all l200 5l). but this is not the case for the clustering of 
galaxies; establishing the link between dark matter halos and the 
baryonic component is one of the most challenging tasks modern 
cosmology has to deal with. 

The AKARI satellite (previously known as ASTRO-F or 
IRIS - InfraRed Imaging Surveyor) was designed to carry out in- 
frared observations with a sensitivity and resolution higher than 
preceding missions. It was launched by JAXA's MV8 vehicle on 
February 22, 2006, and, among many others, it performed a deep 
survey of the North Ecliptic Pole region (hereafter NEP), which 
we aim to use to explore the mid-infrared properties of galaxies, 
in particular the evolution of clustering. However, to achieve this 
goal, we first have to select the proper sample of galaxies from 
the collected data. For this purpose, we first of all need to sepa- 
rate the extragalactic sources from galactic objects (such as stars, 
planetary nebulae, etc.) that contaminate our data. This might be 
performed by means of follow-up observations, which are cur- 
rently on-going, but they introduce additional bias in detected 
sources. 

AKARI data have to be categorized based on the photomet- 
ric data because detailed spectroscopic follow-up observations 
are expensive and much more time-consuming. The most widely 
used tool in astronomy to distinguish stars and galaxies is the 
color-color (CC) diagram. In particular, galaxies display 'red- 
der' colors, meaning that they radiate stronger at longer wave- 
lengths, and stars are mor e 'blue' because they radiate strongly at 
shorter wavelengths (e.g. lWalker et al..l989.: Polio et al...20 10) . 
However, the methods designed up to now cannot be applied di- 
rectly to NEP data, because they were developed for different 
wavebands and shallower catalogs. Since different wavelengths 
often imply observations of different physical processes and/or 
different redshifts, we considered parameters obtained from sev- 
eral different passbands, which will enable us to distinguish 
sources in a multidimensional parameter space. In general, clas- 
sification methods are based on a pattern recognition within the 
data sets. For every object we have a vector describing its char- 
acteristic features. We can use a mapping function, called a clas- 
sifier, to transfer feature vectors into discriminant ones, which 
contain the likelihoods of the given object to belong to one of 
the considered classes. Classification schemes heavily depend on 
choosing a feature space, which should be selected in a way that 
different classes occupy different volumes with minimal over- 
lapping. When a survey is designed without a target object class 
(i.e., the filter sets are not specifically chosen), using unsuper- 
vised classifiers (which work without previous class information 
input) is a good tool to d istinguish obie cts by, for example, using 
the cluster analysis (e.g. lHakkila et al.l200 3). This process relies 
on the visible features of the data. The classification is much 
more obvious when we have some previous knowledge about 
the objects appearing in the survey. Then we can use this knowl- 
edge as an input to a supervised classifier (where we have a 
feature/properties template of observable objects). We here used 
the supporting vector machine (SVM) classifiers (Vapnik, 1995). 
SVMs are used to map input vectors non-linearly into a high di- 
mensional parameter space and construct an optimal separating 
hyperplane. 

This work is organized as follows. In Sect. 2 we give a 
brief description of the collected data to gether with the a uxiliary 
survey performed by Subaru telescope (live et al.l l2004b . which 
observed the NEP Deep field in optical wavelengths in filters 



B, V, R, i', z'. Section 3 describes the sample and parameter space 
selection process. The application of the SVM method and the 
results are presented in Sect. 4, and its accuracy is tested by com- 
paring our results with the separation made for optical survey 
of the NEP region performed by the Subaru telescope, and by 
preparing the flux distribution plots created for objects divided 
according to the established star/galaxy methods in Sect. 5. A 
summary and conclusions are given in Sect. 6. 

2. The data 

The NEP Deep sky surv ey covers an area of 0.4 sq. deg around 
the North Ecliptic Pole (iMatsuhara et al.L l2006h. The da t a were 
obtained by the Infra-red Camera (IRC) (lOnaka et al.L l2007h 
through nine near- and mid-infrared (NIR and MIR) filters, cen- 
tered at 2fim {N2), 3fim (N3), 4fim (A?4), 7//m (SI), 9^im (59), 
1 Ijum (5 1 1), 15 /im (LI5), 18 /im (L18), and 24 f^m (L24) where 
W indicates that the bandwidths are wider than the others. The 
long exposure times (from 1047 s for N2 filter to 261.8 s for 
L24 filter) mad it possible to reach very deep into this region. 
Table[T]summarizes the survey, where A^gf is the reference wave- 
length, A^sources is the total number of detected sources in a spe- 
cific bandpass, magu^ is the limiting magnitude of detected ob- 
jects in a specific filter, and zero point stands for the magnitude 
zero point used in brightness conversion procedures. The point 
spread function (PSF) has a beam size in FWHM of 5 arcsec, 
which makes AKARI's imaging superior to other infrared satel- 
lites. The source extraction on FITS images was made with the 
SExtractor software (Bertin & Arnouts, 1996). A source was as- 
sumed to be detected if it had a minimum of five contiguous pix- 
els above 1.65 times the RMS fluctuations. Instead of allowing 
the program to estimate the background, weight maps were used. 
Photometry was carried out using SExtractor's MAG AUTO vari- 
able elliptical aperture with these aperture parameters: the Kron 
factor and the minimum radius were set to 2.5 and 3.5. The mag- 
nitude zerojgointswerederived from observations of standard 
stars (iTanabe et al.L l2008h and were used to convert counts to 
magnitude by the photometry program. The number of sources 
detected in individual filters differs significantly from each other: 
far more sources are detected in the near-infrared than in the 
mid-infrared. The photometry resulted in obtaining a catalog 
depth of 26.86 mag at 2.4 fim (N2 filter). The results of this pro- 
cedure were downloaded from the official AKARI Researchers 
Web Pag4j SExtractor, and the parameters obtained from this 
run were used in the subsequent analysis, after confirming that 
the basic results were consistent with the original catalogs. 

2.1. Subaru/Suprime-cam optical auxiliary survey of the 
AKARI NEP-Deep field 

To prove the validity of our method for classifying sources we 
confirmed by observations that were not made in the infrared. 
The best way to prove the efficiency of the presented star-galaxy 
separation method is to incorporate auxiliary multiwavelength 
data. The Subaru telescope observed the NEP-Deep region in 
B,V,R,i',z' filters covering ~ 0.25 deg^ (Imaietal. 20 07|) in 
the fi eld of view of the Suprime-cam (S-cam) CMivazaki et al.L 
l2002h . reaching limiting magnitudes of zab - 26. We cross- 
matched the optical data obtained by the Subaru telescope with 
the infrared catalogs, searching for counterparts within the ra- 
dius of 5 arcsec, motivated by the PSF of images and known 



' http://www.ir.isas.jaxa.jp/ASTRO-F/Observation/, however, we 
have performed independent photometry measurements by 
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Table 1. Properti es of the NEP De ep survey based on 
iLorente et al.ll2008l and lWada et aLll2008i 



Band 



A'.c 



Magii,^ Zero point Exp. t [s] 



N2 


2.4 


23325 


26.86 


24.82 


1047 


N3 


3.2 


26180 


25.95 


25.02 


1047 


m 


4.1 


26332 


25.03 


25.32 


981.8 


S7 


7.0 


8650 


23.28 


23.96 


245.5 


S9W 


9.0 


8516 


22.24 


24.51 


212.7 


511 


11.0 


8769 


22.93 


24.19 


229.1 


L15 


15.0 


10611 


24.56 


23.57 


278.2 


Lnw 


18.0 


10782 


23.50 


23.83 


278.2 


L24 


24.0 


5704 


24.59 


22.28 


261.8 



resolution of the detector The possibility of any false identifi- 
cations is assessed in Section 4. 1 . After integrating the optical 
data with infrared data we obtained a catalog consisting of 9699 
sources in total, with 8768 optical counterparts for NIR wave- 
lengths and 3252 in MIR. Below we use these data to test the 
performance of all presented methods of separation based solely 
on infrared data. 



3. Sample and parameter selection 

The subsequent multivariate analysis was performed on a 
merged catalog of objects that were detected in all AKARI IRC 
passbands, which eliminates any possibility of including dropout 
objects. 

As stated before, for a supervised method of classification we 
must adopt catalogs of known astronomical objects. Since we 
aim to develop a method based solely on IR data, we chose to 
use the stellarity parameter (hereafter sgc), an output classifie r 
for objects based on the neural network output (iGurnevI Il997l) . 
which is referred to as the CLASS STAR parameter, as a distin- 
guishing value between the two desired classes. 

As one of the possible star/galaxy separation methods, sgc 
was calculated by SExtractor (Bertin & Arnouts, 1996) soft- 
ware for each source. Detectors produce astronomical images 
with similar linear intensity scales with a good precision over 
large scales to the point where saturation takes place, there- 
fore correctly sampled images can be roughly described by 
pixel scale, depth (signal-to-noise ratio at a given magnitude), 
and seeing. To provide the best possible classifier, input pa- 
rameters should be independent of those characteristics of ex- 
posure. Simple estimators in a two-dimens i onal s pace such as 
magnitude-iso photal area (IReid & Gil more, '1982'), magnitude- 
peak intensity (Ijones et al.l Il99ll), or magnitude-surface bright- 
ness (iHarmon & Mamon, 1993) are the simplest ways of sep- 
arating stars from galaxies. However, the sgc calculation uses 
ten parameters: eight isophotal areas (using more isophotal areas 
then the lowest one enables the classifier to be sensitive to dim 
objects), peak intensity (if the relative uncertainty of maximum 
intensity is high enough, the contrast between the two classes 
is worse), and seeing, which is used as a 'control' parameter. 
The network takes the isophotal areas in units of squared see- 
ing FHWM, which ensures that there will be no need for the 
information about the pixel scale. The peak intensity is given 
in units of extraction threshold to remove the depth informa- 
tion. To obtain an even more reliable classification outcome, 
which is independent of noise, image distortions, and influence 
of close objects, the SExtractor creators did not include any 
elongation measurements in the CLASS STAR computation. Its 




1.0 



Fig. 1. Stellarity parameter (sgc) histogram for N2 (solid line), 
N4 (dotted line), SI (dashed line), 511 (dash-doted fine) and 
LI 8 (dash-triple doted line) image. Here, the abscissa is the sgc 
parameter with intervals of 0.05 and the ordinate is the number 
of objects in each bin. 



value varies between and 1 : 1 stands for a star-like object, and 
for galaxy, or rather a non-stellar extended object. However, 
since the SExtractor is optimized to optical data, it is not obvi- 
ous that this parameter would work for infrared observations. 

Fig. [1] represents a histogram for sgc values for different 
wavelengths. For clarity we show distributions for five filters 
only, two for NIR-A^ (solid and dotted lines)), two for MIR-5 
(dashed and dash-dotted lines) and one for MIR-L (dash-triple- 
dotted line). Evidently, for NIR filters alone, the majority of 
sources are unambiguously classified as extended objects (with 
the value 0), but a fraction of strictly star-like objects is also 
detected (with the sgc value ~ 1 ) (see Fig. [TJ. However, the 
remaining S and L-filter-based sgc measurements seem to indi- 
cate that very little or no stars are visible in these filters, since 
we have only one concentration around sgc = 0, i.e., clearly ex- 
tended objects. The sgc histograms for MIR wavelengths lead 
us to conclude that the interpretation of this parameter at the 
longer wavelengths will not be useful for object classification, 
unlike at NIR. Moreover, all passbands have a local maximum 
around a value of sgc = 0.5, which means that for the applied 
algorithm the sources look neither clearly extended nor clearly 
star-like. Because of this ambiguity we decided to use the su- 
pervised approach and train the classifier based on the small but 
clearly determined samples of stars and galaxies instead of us- 
ing the sgc itself as a separator The training samples of star- 
like and extended sources are constructed in a way that objects 
with sgc value in between and 0.05 are treated as galaxies, and 
0.95 and 1 are treated as stars, which resulted in obtaining train- 
ing samples that consist of 825 galaxies and 532 stars. 

All possible color combinations are equally significant, 
therefore we kept the dimensionality of the parameter space low 
by choosing the infrared color indexes as follows: N2 - N3, 
N3 - m, m-Sl,Sl -Sn,Sn -L15, L15 -L18. Table 
|2] gives the statistical properties of the training samples, where 
columns list the mean values of parameters with their standard 
errors for selected samples of stars and galaxies. Clearly, that the 
mean values of parameters differ for different classes of sources. 
The most striking feature is the 57-511 value for galaxies, 
which is drastically higher than for stars. The color index val- 
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ues for stars are systematically smaller. However, while in the 
NIR wavelength regime the differences are moderate, the dis- 
crepancy between them becomes vast when moving into longer 
wavelengths. On average, sample galaxies have the lowest flux at 
7 yum. For N4 - S7 index the difference between the two classes 
is marginal, and for 57 - 5 1 1 it is most obvious. 

Table 2. Mean values of parameters for the training samples in 
the multicolor space. 



Parameter 



Galaxies 



Stars 



N2-N3 

N3-N4 

N4-S1 

57-511 

511-L15 

LIS -Lis 



-0.05 + 0.67 
-0.11 ±0.58 
-0.15+0.74 
0.79 ± 0.89 
0.47 ± 0.89 
0.03 ± 0.67 



-0.32 + 0.59 
-0.53 ±0.56 
-0.17 ±0.97 
0.00 ±1.01 
-0.16 ±1.08 
-0.12 ±0.84 



4. Support vector machines 

Support vector machines are a supervised method based on 
kernel algorithms ( Shawe-Taylor & Cristianini 2004) of extract- 
ing structures from data a nd have proven themselves to be of 
great use in astronomy (e.g. Wozniak et al.ll2004i IZhang & Zhaol 
[2004; H uertas-Companv et al.ll2008h due to their ability to deal 
with multi-dimensional data and its high accuracy. 

To train the SVM algorithm means to put in a feature vec- 
tor for each object of the training example, i.e., quantities that 
describe the properties of a given class of objects. Therefore we 
maped the input data from the input space X onto a feature space 
H using a non-linear function (p:X-^H. In the parameter space 
H the function that will determine the boundary, which can be 
written as 



1=1 



(1) 



where k(x, x') is the kernel function returning an inner product 
of the mapped vectors, a, is a linear coefficient and b is a per- 
pendicular distance called bias, which translates it into a given 
direction. 

The shortest distance from the boundary to the closest points 
belonging to the separate classes (support vectors) is called the 
margin, and the algorithm searches for a hyperplane that maxi- 
mizes it. The training samples of stars and galaxies were chosen 
to train the Gaussian radial basis kernel function: 



k(x,x') - exp(-7'||x -x'll ), 



(2) 



where y is the adjustable kernel width parameter, which is re- 
sponsible for the curvature of the decision surface. Since the 
data are not clearly separable, we added a parameter (C), which 
controls the trade-off between the misclassification and large 
margins. For a m or e detailed description we refer the reader to 
THsu et alJ (l2()()3h or lCristianini & Shawe-Tavloil(l2000l) . 

AKARI IRC photometry provides us with nine dimensional 
datasets. We reduced the number of dimensions by removing 
measurements in two filters: S9 and L24, since the amount of 
the data collected through these passbands is significantly lower 
than in the rest, rendering the resulting cross-matched catalog 
statistically insufficient. With seven different flux measurements 



we built a six dimensional parameter space through using color 
indexes. We used two training samples containing stars and 
galaxies chosen according to their sgc value measured in NIR 
to train SVM and obtain its classifier 

The two kernel parameters y and C are not known before- 
hand and it is necessary to find the best values to obtain accurate 
results. To tune these parameters for the best performance, we 
performed a grid-search with values from 10"^ to 10^ using a 
ten-fold cross-validation technique. To that end we divided the 
full training set into ten subsets of equal size and selected nine 
subsets to train the classification model and test it on the remain- 
ing subset and count the TS (true star: when an object classified 
as a star in the training set is classified as a star by SVM), the TG 
(true galaxy: when a galaxy from a training sample is classified 
as a galaxy by SVM), the FG (false galaxy: when a source from 
a star training sample is classified as a galaxy by SVM), and the 
FS (false star: when an object from a galaxy training sample is 
classified as a star by SVM). After concluding all iterations we 
summed the values and calculated the accuracy (Ace) defined as 



Ace 



TS +TG 



TS +FS +TG + FG 
true star rate (TSR) defined as 

and true galaxy rate (TGR) defined as 
TG 



TGR 



TG + FS' 



(3) 



(4) 



(5) 



The procedure resulted in selecting the pair (y, C) equal to 1 and 
10"^, which provides a total accuracy of 93%. The final results 
for TGR and TSR are summarized in Table |3] 



1.00 



0.95 



0.90 



B B B B B 



B T [] 

[] 



12 



14 



16 1£ 

N2 [mag] 



20 



22 



Fig. 2. Accuracy rate as a function of magnitude for N2 pass- 
band. Error bars represent Poisson uncertainty. 



Fig. 12] represents the accuracy rate (Eq. |3) as a function of 
magnitude for the N2 filter, calculated in bins of 0.5. The ac- 
curacy, though still maintaining high values, decreases with the 
decrease of brightness beginning from ~ 15 mag, which may 
be a projection effect of bright galactic stars (which fade away 
in longer passbands) blocking and/or blending with the fainter 
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extragalactic objects. We explore the contamination problem be- 
low. At the faintest end the accuracy rises again (~ 20.5 mag) 
because in this range we have only few objects, which are mostly 
dim but unambiguously galaxies. 

Next, we applied the classifier on the whole test sample 
(2539 objects). We obtained 1657 objects classified as galax- 
ies and 877 objects classified as stars. As stated before, CC dia- 
grams are the most commonly used tools for object recognition. 
Therefore we can assume that NIR colors are useful for select- 
ing stars in the survey, because they radiate strongly in narrow 
passbands of short wavelengths, while galaxies possess much 
redder colors with a stronger color discrepancy, and appear as a 
dispersed cloud because of the variety of components compris- 
ing the spectra and their distance to the observer To test this 
hypothesis and compare it with the resulting SVM classification 
we projected the constructed 6D multicolor space into a stan- 
dard two-color space. Figures |3] ID and|5]preset the results of the 
classification for the whole sample, which show the division be- 
tween two classes. As predicted, stars occupy compact regions 
of the diagrams since they have narrow emission, while galaxies 
tend to spread over wider range of values of color indexes. The 
regions where the contours for stars and galaxies overlap indi- 
cate the projection of the class' decision boundary margin. We 
fitted a linear function to the points on the 2D color space lying 
on the boundary hyperplane to mark the separation; the coeffi- 
cients are listed in Table |4] 

Table 3. Performance of the trained classifier to separate stars 
from galaxies. 



star's) position in the multicolor space has a separation bound- 
ary distance smaller than the error bar, it was treated as a po s- 
sible missclassification. The contamination was estimated to be 
13.16% for the galaxy catalog and 9.01% for the star catalog. 
What is more, the missclassifications usually display a sgc value 
of ~ 0.5, which confirms that these objects have to be treated 
with caution. When viewed on the FITS images, they appear to 
be either interacting systems and/or blended objects. Because 
we aim at having a pure galaxy or stellar sample, these sources 
should be removed. 



-2 2 

S7-S11 [mag] 



class 



SVM star SVM galaxy 



actual star 
actual galaxy 
accuracy (%) 



441 (TS) 

91 (FS) 

98 (TSR) 



1 1 (FG) 
817 (TG) 
90 (TGR) 



Fig. 3. Projection of the SVM classification from multicolor 
space onto the A^2 - A^3 and 57 - 5 1 1 parameter space. Solid 
contours represent the occupancy zone for stars, dashed contours 
for galaxies. 



Table 4. Coefficients of the linear fit to stars and galaxies lying 
on the 2D projection of the boundary hyperplane. 



colo 


■ 


color 




a 


b 


N2- 


-N3 


N3- 


N4 


-0.14 + 0.18 


0.08 ±0.10 


N2- 


-N3 


N4- 


57 


0.20 ±0.12 


0.18 ±0.09 


N2- 


-N3 


57- 


511 


-0.34 ±0.10 


0.49 ±0.13 


N2- 


-N3 


511 


-L15 


0.33 ±0.10 


0.02 ± 0.08 


N2- 


-N3 


L15 


-L18 


0.02 ±0.16 


0.13 ±0.08 


N3- 


-N4 


N4- 


57 


0.18 ±0.08 


-0.29 ± 0.06 


N3- 


-N4 


57- 


511 


-0.04 ± 0.08 


-0.29 ±0.10 


N3- 


-N4 


511 


-L15 


-0.11 ±0.08 


-0.33 ± 0.06 


N3' 


-N4 


L15 


-L18 


0.21 ±0.11 


-0.33 ± 0.06 


N4- 


-57 


57- 


511 


-0.46 ±0.10 
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Based on the number of objects belonging to the two classes 
that lie within the hyperplane margin, we estimated the con- 
tamination of our samples. If an SVM classified galaxy's (or 



to 
to 




L15-L18 [mag] 

Fig. 4. Projection of the SVM classification from multicolor 
space onto the 57-511 and L15 - L18 parameter space. Solid 
contours represent the occupancy zone for stars, dashed contours 
for galaxies. 
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Fig. 5. Projection of the SVM classification from multicolor 
space onto the N2 - N3 and L15 - L18 parameter space. Solid 
contours represent the occupancy zone for stars, dashed contours 
for galaxies. 



5. Methodology verification 

5.1. Subaru/Suprime-cam optical auxiliary survey of the 
AKARI NEP-Deep field 

To check the validity of our star-galaxy separation methods, we 
created an integrated optical-infrared catalog and assigned an 
optical sgc parameter to all test objects. When we transit from 
the optical part of the spectrum, the hot, blue stars fade away and 
cooler red stars come into view. Therefore there is a possibility 
of a high fraction of misclassified stars from IR and optical ob- 
servations. Stellar NIR emission is dominated by red giants and 
low-mass red dwarfs. When we shift into MIR, the cool stars 
fade away and the dust-enshrouded stars emerge. In this regime 
we can observe even cooler objects, such as planets or asteroids, 
but considering the FOV and the depth of the survey, their con- 
tribution is not statistically significant. 

With the same selection criteria as for the input samples, we 
created new training samples: 105 stars (with 0.95 < sgc < 1), 
226 galaxies (with < sgc < 0.5). The mean values of the new 
training samples are summarized in Table |5] 



Table 5. Mean values of parameters for the multicolor samples 
based on optical sgc parameter. 



Properties 



Galaxies 



Stars 



N2-N3 

N3-N4 

N4-S7 

57-511 

511-L15 

L15-L18 



-0.09 + 0.88 
0.38 ± 1.12 
0.62 ± 0.77 
0.82 ± 0.83 
-0.22 ± 5.44 
0.49 + 5.44 



-0.49 ± 0.73 
0.07 + 0.94 
0.35+0.70 
0.31 ±0.84 
0.06 ± 0.52 
0.13+0.54 



As expected, the farther we move into long wavelengths, the 
dimmer the stars become. Galaxy sample on the other hand have 
two minima at 3 fim and 15 fim. We compared the results with 
those obtained from the first classifier. If we assume that a cor- 
rectly classified object possesses the same SVM class in both 
IR and optical sgc based classifications, then total classification 



has a 91% accuracy with the TRS and TGR accuracy equal to 
65% and 96%, respectively. The result for the galaxy classifica- 
tion is very efficient. The efficiency is still good for stars, but it 
is lower than for galaxies. The reason is that young and hot stars 
that are easily detected in optical wavelengths gradually disap- 
pear when they are observed in longer wavelengths. Therefore, 
their infrared counterparts could possibly be different objects, 
invisible in optical wavelengths due to their close proximity to 
bright stars. This also explains the high efficiency of selecting 
stars based just on the IR criteria. Old, cool stars or stars with 
protoplanetary disks, which are hardly detectable in optical pass- 
bands, have their peak radiation in the NIR. Therefore it is safe 
to conclude that the classifier works very well for the infrared 
classification. 

5.2. Number counts 

In this section we present the Euclidean normalized number 
counts for all considered sources and for the resulting counts 
for separate class es. We co mpare stellar counts with the Faint 
Source Model (Arendt et aU il998) (hereafter FSM) to assess the 
reliability of our results with the theoretical predictions. Since 
its primary goal was to measure the cosmic infrared background 
at NIR and MIR wavelengths, FSM was created as a means to 
remove the strong contributions of foreground emission, which 
originates within our Galaxy. At NIR wavelengths the contri- 
bution consists mainly of starlight, the majority of which can 
be resolved into point sources. However, a significant number of 
stars are blended into the diffuse background. The FSM was con- 
structed specifically to solve this problem. The MIR (and FIR) 
emission is dominated by thermal emission from dust residing 
in the interstellar medium and in more compact star-forming re- 
gions. In the wavelengths longer than 12 fim, the faint source 
emission contributes less than 40% of the observed brightness 
toward the inner Galaxy at low latitudes and drastically de- 
creases (to 1%) for higher galactic latitudes with increasing 
wavelength. Therefore the FSM for MIR can only follow counts 
in the inner Galaxy. 

The measured flux in analog-to-digital units (ADU) was con- 
verted to yuJy by multiplying the counts by a corresponding con - 
version factor calculated for every filter (Lorente et al.i tZOOSh . 
Here we denote a flux density Sy at wavelength A jum as Sa, but 
the units are [Jy]. The extragalactic Euclidean normalized differ- 
ential source counts display a flat distribution at bright fluxes. If 
any evolution is present in the observed sample, it will be indi- 
cated by a change in the slope of counts at fainter fluxes. If a cer- 
tain population of galaxies is evolving negatively (i.e., dimming 
with time), the counts can be lower than the Euclidean slope. 
On the other hand, if the evolution is positive, the count slope 
is steeper. At the faintest fluxes the counts will suddenly drop 
because of the dimming effect of cosmological redshift. 

Figures |6] |7] and |8] present the Euclidean normalized dif- 
ferential number counts in N2, S7, and LI 8 filters for all test 
objects with an assigned star or galaxy tag according to the ob- 
tained classifier. Squares represent total counts, asterisks present 
stellar counts, extragalactic counts are indicated by circles. 

In the counts for NIR wavelengths (e.g. Fig. |6|l we can see 
that the abundance of stars in the data is so high that number 
counts provide no distinction whatsoever. The raw counts show 
high consistency with the FSM, which proves that stars indeed 
dominate at the bright end of the counts. The stellar counts pre- 
cisely follow the theoretical predictions, and the extragalactic 
counts display distinctive features: a bump in counts at N2 filter 
is visible at 5,, ~ 3mJy together with an upturn at the brightest 
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end. In fainter fluxes, Sv< I mJy, the counts in these band passes 
slightly increase, signaling positive source evolution. They reach 
a maximum value at S'v ~ 0.8 mJy, and tail off' at the faintest end, 
possibly because of cosmological dimming and/or catalog in- 
completeness. 

For MIR-5 bands (e.g., Fig.|7]l the raw counts still contain 
a fraction of stellar sources. After separating stars from galaxies 
the expected flat distribution in extragalactic counts emerges. For 
Sv < 0.1 mJy the extragalactic counts have a maximum and then 
start to tail off". 

In MIR-L bands (e.g.. Fig. [SJ the shape of total counts has 
changed, pronouncing evolution at fainter fluxes. The extra- 
galactic counts display the Euclidean distribution to ~ 1 mJy, 
where they start to increase, reaching a maximum value at ~ 
0.4 mJy. Then, at 5v. <0.3mJy, the counts begin to decrease. This 
is the effect we expect, because it is known that stars are sys- 
tematically brighter than galaxies, since they are much closer to 
us, and this remains true also at infrared wavelengths. The stel- 
lar counts follow the FSM model to Sy ~ 3mJy and the shape 
indicates that there is a fraction of extragalactic objects classi- 
fied as stars at the faintest end. A closer look at the AKARI NEP 
surveys source counts was provided bv lPearson et al.l(l2010l) . 
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Fig. 6. Euclidean normalized number counts for all objects in the 
sample with N2 fluxes represented by squares, asterisks present 
counts of sources classified as stellar, extragalactic counts are 
indicated by circles. The line presents stellar number counts pre- 
dicted by the FSM. Error bars represent Poisson uncertainty in 
logarithmic units. 
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Fig. 7. Euclidean normalized number counts for all objects in the 
sample with S 7 fluxes represented by squares, asterisks present 
counts of sources classified as stellar, extragalactic counts are 
indicated by circles. The line presents stellar number counts pre- 
dicted by the FSM. Error bars represent Poisson uncertainty in 
logarithmic units. 
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Fig. 8. Euclidean normalized number counts for all objects in the 
sample with LI 8 fluxes represented by squares, asterisks present 
counts of sources classified as stellar, extragalactic counts are 
indicated by circles. The line presents stellar number counts pre- 
dicted by the FSM. Error bars represent Poisson uncertainty in 
logarithmic units. 



6. Summary 

Measurements of the stellarity parameter carried out for near- 
infrared observations possess good quality for creating bimodal 
samples with precisely defined classes, which in this work we 
associated with stars and galaxies. With this knowledge alone 
we used these strict separating criteria to create training samples 
as an input to obtain the classifier, and we tested its accuracy on 
a test sample of objects detected in the AKARI NEP Deep field 
in all narrow passbands. We set up a six-dimensional parameter 
space with infrared color indexes, which have diff'erent separat- 
ing values for two desired object classes. Our training sample 
classifier performed well on the true classes of sources, with an 



accurancy of 98% for stars and 90% for galaxies, when con- 
sidering infrared measurements alone. Moreover, after project- 
ing the results into two-dimensional color spaces, we showed 
that the two classes overlap. However, the basic division be- 
tween stars and galaxies emerges, which is consistent with the 
expected behavior of star/galaxy classes' occupation locus of 
the CC diagrams. Nevertheless, the clear distinction is visible 
only in higher dimensions. When assigning an optical value of 
sgc to all test objects, we created a new classifier and compared 
the accuracy of new training sets against IR ones. We chose the 
optical sgc for confirmation because SExtractor was originally 
designed to deal with optical data. Our results indicate that the 
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optical classifier works for multicolor IR data with less efficiency 
than the IR classifier. However, we should keep in mind that the 
S ubaru observations were carried out in a much more narrow 
FOV then AKARI. Nevertheless, the results of the comparison 
are still very good: 65% of objects are classified as stars by both 
optical and infrared classifier, and 96% of IR classified galax- 
ies are pinpointed as galaxies by optical SVM. The discrepancy 
for stars is probably caused by the fact that when observations 
move into the infrared wavelength regime, the optically bright 
stars start to fade away, while the optically faint objects start 
to emerge, overshadowing the previously bright stars. We also 
suspect for optically bright stars a chance of misclassification in 
the MIR catalogs more often than for other sources. As an alter- 
native confirmation of the accuracy of our division we created 
Euclidean normalized source counts for the two selected classes 
of objects. At the brightest fluxes stellar counts in all wave- 
lengths agree well with theoretical predictions of the FSM, espe- 
cially for NIR-A^ filters, where they follow the applied model to 
~ 2 mJy. For MIR-L the stellar contribution at MIR wavelengths 
is very low. In addition, the source counts reveal traces of posi- 
tive evolution in faint fluxes in both NIR and MIR wavelengths. 
Therefore it is safe to conclude that our infrared-based classi- 
fier allows the successful selection of galactic and extragalactic 
objects for future analyses. 
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